Welcome to Amethyst! The goal of Amethyst is to facilitate comprehensive analysis tools for single-cell methylation data. If you use this package or code on our Github for your work, please cite our manuscript.
If you are new to R, you may need to install some of the dependencies:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
library("BiocManager")
BiocManager::install(c("caret", "devtools", "data.table", "furrr", "future", "future.apply",
"ggplot2", "grDevices", "gridExtra", "igraph", "irlba", "janitor", "Matrix", "methods", "pheatmap",
"plotly", "plyr", "purrr", "randomForest", "rhdf5", "rtracklayer", "Rtsne", "scales", "stats", "stringr",
"tibble", "tidyr", "umap", "utils", "dplyr"))
Next, install a few additional dependencies found on Github, including amethyst itself.
devtools::install_github("JinmiaoChenLab/Rphenograph")
devtools::install_github("KrishnaswamyLab/MAGIC/Rmagic")
devtools::install_github("TomKellyGenetics/leiden")
devtools::install_github("lrylaarsdam/amethyst")
Now load libraries into R:
library(leiden)
library(amethyst)
library(data.table)
library(ggplot2)
library(dplyr)
library(tibble)
library(tidyr)
library(plyr)
library(future)
library(furrr)
library(purrr)
library(cowplot)
library(pheatmap)
This vignette comes with a toy dataset of 50 high-coverage cells derived from the human brain cortex. It is similar to the PBMC vignette, but contains some added features as the brain uniquely has very high levels of CH methylation in addition to CG. These two modalities operate by different principles and therefore necessitate tailored analysis approaches. However, processing CH methylation can take a long time, as there about half a billion sites in the human genome. To make this illustration quickly executable on any local computer, we have already done initial processing for computationally-intensive steps. These can be downloaded with a pre-made workspace in the next section. To do all steps from scratch, we recommend going through the CG-focused PBMC vignette.
Note: By default, data will download to the “~/Downloads/” folder. Change if a different directory is desired.
# h5 file containing base-level CG methylation information. This file is 12 GB and not necessarily needed for this tutorial, but available if desired.
# download.file("https://adeylabopen.s3.us-west-2.amazonaws.com/amethyst/brain_vignette.h5", "./brain_vignette.h5", method = "curl")
# workspace containing initial CH processing steps
download.file("https://adeylabopen.s3.us-west-2.amazonaws.com/amethyst/brain_vignette_workspace.RData", "~/Downloads/brain_vignette_workspace.RData", method = "curl")
We will also download some helpful metadata for each cell. The first file is a cellInfo.txt file, which is produced by the content_extract step of initial processing with Premethyst. The ScaleMethyl pipeline has an analagous file, allCells.csv, which is automatically loaded if you use the createScaleObject helper function (see Assembling the Amethyst object: ScaleMethyl-specific assembly instructions.)
# metadata files
download.file("https://raw.githubusercontent.com/lrylaarsdam/amethyst/main/vignettes/brain_vignette/brain_vignette_cellInfo.txt", "~/Downloads/brain_vignette_cellInfo.txt")
The second metadata file we will download is a .annot file, which is a manually assembled text file commonly used within the Adey lab. It contains the key for barcode sequences and associated samples. We are able to map this using knowledge of how the samples were distributed in the plate at the first barcoding step. This file is not essential to all workflows and is mostly a helper function for internal Adey lab use - the important thing is to get an idea for how we are assembling the metadata.
# metadata files
download.file("https://raw.githubusercontent.com/lrylaarsdam/amethyst/main/vignettes/brain_vignette/brain_vignette.annot", "~/Downloads/brain_vignette.annot", method = "curl")
First, we must construct an amethyst object. In R, an object structure is a convenient method for storing all our project information in one place. It stores paths to the h5 file, metadata, aggregated methylation values, results, and more. Each type of data goes in specific “slots” so Amethyst knows where to retrieve it. This has already been initialized with the createObject command in the process of pre-calculating computationally intenstive steps. Load the workspace to see a summary of the object so far. Please see the PBMC vignette for a detailed description of how to construct the object.
# obj <- createObject()
load("~/Downloads/brain_vignette_workspace.RData")
summary(obj)
## Length Class Mode
## 1 amethyst S4
Next, we need to add metadata about each cell. Useful metadata includes quality control metrics contained in the .cellInfo.txt intermediate output or .annot files, if using the Premethyst workflow.
# don't run; the output of these steps is already in your workspace
# obj <- addCellInfo(obj, file = "~/Downloads/brain_vignette_cellInfo.txt")
# obj <- addAnnot(obj, file = "~/Downloads/brain_vignette.annot", name = "method")
We also need to specify the location of the h5 file containing site-level methylation data for each barcode. In this case, every barcode belongs to the same h5 file, but an unlimited number of h5 files can be used in the same object.
Note: If you are combining experiments with overlapping barcodes, please see our vignette for addressing this. The h5paths slot is set up differently. Also, please note we have changed the heading from ‘paths’ (v0.0.0.9000) to ‘path’ (v1.0.0) to be more consistent, given the addition of the barcode and prefix columns.
obj@h5paths <- data.frame(barcode = rownames(obj@metadata), path = rep("~/Downloads/brain_vignette.h5", length(rownames(obj@metadata))))
It can be helpful to filter cells with outlying coverage values right away so downstream functions don’t unnecessarily perform calculations for cells that will not be used. First, view the coverage distribution - i.e., how many cytosine positions are captured per cell. We want to filter out cells that will have too sparse of information to properly cluster, or filter cells with outlying extremely high coverage values. With the commands below, we are simply plotting the distribution of coverage values in the metadata with ggplot, then filtering failed cells from the metadata with dplyr logic.
Note: Vignette data has been pre-filtered. We recommend cells have a minimum of 1M cytosines covered, if possible. Filter by any additional metrics in your metadata as needed. For example, in this brain data, I am further filtering out cells with < 12% mCH. This threshold was chosen because it is about double the expected maximum %mCH that has been shown based on previous research, so if a cell had higher %mCH than this, something likely went wrong with the processing (like incomplete bisulfite conversion).
ggplot(obj@metadata, aes(x = cov)) + geom_histogram(bins = 10) + theme_classic()
obj@metadata <- obj@metadata |> dplyr::filter(cov > 1000000 & mch_pct < 12)
The next step is to cluster cells, which we typically do based on methylation values over fixed genomic windows. Depending on the size and depth of your dataset, calculating methylation levels over features can be a computationally intensive step. There are three ways to do this (only do one):
Note: Clustering methods for single-cell methylation data, as a newer modality, will require more hands-on interpretation of results. Please thoroughly check that your results are not driven by technical issues (like coverage bias) and expect that this step will take some optimization.
The most straightforward method is to calculate methylation levels over genomic windows with Amethyst in R.
First, perform an initial indexing step, which determines the locations corresponding to each chromosome in every cell’s h5 file. This reduces the computational load by only calculating across one at a time. The output is a list of data frames. Each data frame corresponds to the where the base-level info starts and ends for each chromosome and cell.
Note: Many functions in Amethyst can be parallelized by increasing the threads parameter, but this is often not practical on a local computer. Do not increase the thread number for this vignette, but increase when performing initial computationally heavy steps on a server.
# don't run; the output of these steps is already in your workspace
# obj@index[["chr_cg"]] <- indexChr(obj, type = "CG", threads = 1)
Now that you have made the index, use makeWindows to calculate methylation levels over large genomic windows. The output is a data frame of windows (rows) x cells (columns). There are three options for the “metric” parameter: - “percent”, which is just percent methylated cytosines out of 100 - “ratio”, which is the methylation level of the window divided by the global average methylation level - “score”, which is a scaled measure of deviation from baseline - Formally, score is calculated as: (mCGfeature – mCGglobal)/(1 - mCGglobal) if mCGfeature – mCGglobal > 0, and (mCGfeature – mCGglobal)/(mCGglobal) if mCGfeature – mCGglobal < 0. The important thing to remember with “score” is -1 corresponds to an entirely unmethylated region, 0 is the same as average methylation, and 1 is fully methylated.
Note: If you are using non-conventional chromosomes, please
provide a whitelist (see function parameters).
Note: You may have to copy/paste any code reading the h5 file (e.g.,
makeWindows) directly into the console instead of running the chunk, if
using the .Rmd template. Note: An NA value indicates no
cytosines were captured for that given feature.
The code below has been run for you to save time. Results are already in your workspace.
# don't run; the output of these steps is already in your workspace
# obj@genomeMatrices[["cg_100k_score"]] <- makeWindows(obj, stepsize = 100000, type = "CG", metric = "score", threads = 1, index = "chr_cg", nmin = 2)
# obj@index[["chr_ch"]] <- indexChr(obj, type = "CH", threads = 1)
# obj@genomeMatrices[["ch_100k_pct"]] <- makeWindows(obj, stepsize = 100000, type = "CH", metric = "percent", threads = 1, index = "chr_ch", nmin = 5)
Note: Review section 2 for reference, but note that workflow continues at “clustering with Amethyst or Facet - continued” section
makeWindows can be memory-intensive. If you have very large datasets, it may be preferable to apply Facet, a Python-based helper script to compute windows in a more efficient manner. Results are stored in the h5 file and loaded as needed. For installation instructions and examples, see Facet documentation. The test dataset already has 100kb window values calculated using the command below.
# don't run; example of how to compute aggregated windows with facet
# facet agg -u 100000 brain_vignette.h5
If pre-computed windows are stored in the h5 file, load to the genomeMatrices slot with loadWindows. An indexing step is not needed for Facet.
# don't run; example of how to load aggregated windows with facet
# obj@genomeMatrices[["cg_100k_score_facet"]] <- loadWindows(obj, name = "100000", nmin = 2, metric = "score")
You may want to remove windows where most values are NA. In this case, since the vignette data is high coverage and the genomic windows are large, I am going to filter for at least 90% of the cells have values in that window. The appropriate threshold will highly depend on how big the windows are and how many cells you have. We will also remove chrY windows because they tend to be sparse and can induce clustering artifacts.
Note: Please adjust this threshold in a dataset-specific manner. If you use short windows or have lower-coverage data, a 90% threshold is way too high.
# remove windows with < 90% observed values
obj@genomeMatrices[["cg_100k_score"]] <- obj@genomeMatrices[["cg_100k_score"]][rowSums(!is.na(obj@genomeMatrices[["cg_100k_score"]])) >= nrow(obj@metadata)*.9, ]
obj@genomeMatrices[["ch_100k_pct"]] <- obj@genomeMatrices[["ch_100k_pct"]][rowSums(!is.na(obj@genomeMatrices[["ch_100k_pct"]])) >= nrow(obj@metadata)*.9, ]
# remove chrY windows
obj@genomeMatrices[["cg_100k_score"]] <- obj@genomeMatrices[["cg_100k_score"]][!sapply(rownames(obj@genomeMatrices[["cg_100k_score"]]), function(name) grepl("chrY", name)), ]
obj@genomeMatrices[["ch_100k_pct"]] <- obj@genomeMatrices[["ch_100k_pct"]][!sapply(rownames(obj@genomeMatrices[["ch_100k_pct"]]), function(name) grepl("chrY", name)), ]
Next, perform dimensionality reduction. If you are unsure how many dimensions to use, the dimEstimate function can estimate the number needed to explain the desired variance threshold. In this example, we will reduce the data from “cg_100k_score” and “ch_100k_pct” into five dimensions using the irlba package, which performs fast truncated singular value decomposition. Feel free to implement alternative methods if desired. The output will be a data frame of cells (rows) by reduced dimensions (columns) with the corresponding component values.
set.seed(111)
# name the result whatever you want. I like descriptive names, at the cost of length.
obj@reductions[["irlba_ch100k_cg100k"]] <- runIrlba(obj, genomeMatrices = c("ch_100k_pct", "cg_100k_score"), dims = c(5, 5), replaceNA = c("mch_pct", 0))
Now determine cluster membership with runCluster. Either a Louvain-based or Leiden-based strategy can be used. Assign the output column name, which will be added to the metadata (or leave blank and it will be assigned “cluster_id”). This update was implemented so one can save multiple iterations within the same object.
Note: In this example, k is low because brain_vignette.h5 has 50 cells. Increase for larger datasets, depending on number of cells and expected heterogeneity. For example, we would typically use values of 30-100 for datasets of 1k-100k cells. A general rule of thumb is k = sqrt(n)/2. Note: I am setting the seed to try and make results reproducible. It is not essential, but good practice for your own data. Annoyingly, different R versions produce different random results with the same seed. I am using v4.3.0. Don’t worry if your projections look slightly different.
set.seed(111)
# run clustering with louvain-based method
obj <- runCluster(obj, k = 10, reduction = "irlba_ch100k_cg100k", method = "louvain", colname = "louvain_cluster")
obj <- runCluster(obj, k = 10, reduction = "irlba_ch100k_cg100k", method = "leiden", colname = "leiden_cluster")
You can use the head function to look at how results are stored:
head(obj@metadata)
## cov cg_cov mcg_pct ch_cov mch_pct method
## ATTGAGGATAACGCTTCGTCCGCCGATC 106462778 4372087 77.09 86598316 3.09 LA
## ATTGAGGATAACGCTTCGTCTAAGGTCA 116628882 4956483 75.33 94834293 5.30 LA
## ATTGAGGATACATTACCAAGACCTTGGC 102852337 4536119 75.48 83270322 5.99 LA
## ATTGAGGATACCAATTCCATCACGAGCG 104380792 4490347 70.24 84576070 1.21 LA
## ATTGAGGATACCAATTCCATTTGCCTAG 105518604 4616346 75.88 85112793 5.80 LA
## ATTGAGGATACTACTGCAAGATGGCATG 110343538 4985043 72.10 88886888 1.75 LA
## louvain_cluster leiden_cluster
## ATTGAGGATAACGCTTCGTCCGCCGATC 4 4
## ATTGAGGATAACGCTTCGTCTAAGGTCA 1 2
## ATTGAGGATACATTACCAAGACCTTGGC 1 2
## ATTGAGGATACCAATTCCATCACGAGCG 2 5
## ATTGAGGATACCAATTCCATTTGCCTAG 1 2
## ATTGAGGATACTACTGCAAGATGGCATG 3 1
Next, project to 2D with runUmap and/or runTsne. Assign the output to a reductions slot. Like the runCluster update, this allows multiple projections to be stored within the same object. The results are 2D UMAP or TSNE coordinates with column names “dim_x” and “dim_y”. Future plotting functions like dimFeature will depend on this structure.
Note: Neighbors/perplexity values are small because this dataset is 50 cells. Increase for larger datasets, depending on number of cells and expected heterogeneity. A good starting point is k = sqrt(n)/2.
set.seed(111)
# name the result whatever you want. I like descriptive names, at the cost of length.
obj@reductions[["umap_irlba_ch100k_cg100k"]] <- runUmap(obj, neighbors = 5, dist = 0.05, method = "euclidean", reduction = "irlba_ch100k_cg100k")
obj@reductions[["tsne_irlba_ch100k_cg100k"]] <- runTsne(obj, perplexity = 5, method = "euclidean", theta = 0.2, reduction = "irlba_ch100k_cg100k")
There are many methods out there for clustering. One may wish to implement an alternative strategy, like clustering with MethSCAn variably methylated regions. This worked quite well for our small, high-coverage test dataset. Results from other methods can easily be dropped in to an Amethyst object and analysis can proceed as normal. Please see our vignette for example alternative clustering methods. In the end, all you have to do is assign the output to a new reductions result:
# don't run; theoretical example of 2D umap projection from methscan vmrs
# obj@reductions[["umap_methscan_vmrs"]] <- methscan_result
# colnames(obj@reductions[["umap_methscan_vmrs"]]) <- c("dim_x", "dim_y")
First, plot the dimensionality reduction results with dimFeature. Each point is a cell. x and y coordinates are derived from the UMAP or TSNE results we just calculated above. Points can then be colored by any variable in the metadata, allowing assessment of how different parameters are driving results. For example, use colorBy = louvain_cluster to show how the different group assignments are distributed.
p1 <- dimFeature(obj, colorBy = louvain_cluster, reduction = "umap_irlba_ch100k_cg100k", pointSize = .8) + ggtitle("UMAP CH 100k pct + CG 100k score; Louvain")
p2 <- dimFeature(obj, colorBy = leiden_cluster, reduction = "tsne_irlba_ch100k_cg100k", pointSize = .8) + ggtitle("TSNE CH 100k pct + CG 100k score; Leiden")
plot_grid(p1, p2)
dimFeature and many other visualization functions use ggplot logic, so you can easily modify plots as needed. For example, here we will facet the dimFeature results by the “batch” variable in obj@metadata.
# Adding a random variable "batch" to illustrate faceting
set.seed(111)
obj@metadata$batch <- sample(1:3, nrow(obj@metadata), replace = TRUE)
dimFeature(obj, colorBy = louvain_cluster, reduction = "umap_irlba_ch100k_cg100k", pointSize = .8) + facet_wrap(vars(batch), nrow = 1)
It is often helpful to see the distribution of metadata variables between samples. The command below will plot the cluster proportion distribution for each batch in a bar chart. The width of each color will correspond to the proportion of grouping metadata devoted to each cluster. In this context, we want to check that the proportions are relatively even between clusters to make sure that there is not drastic variation between batches. Again, plots can be easily modified with ggplot command logic.
Note: Since there are so few cells, the variation in proportion between batches is not concerning.
sampleComp(obj, groupBy = "batch", colorBy = "louvain_cluster")
If you want to make your plots look nicer, Amethyst provides many built-in color palettes. The function testPalette will show you what each option will look like with the required number of colors (one for each cluster, in this case).
testPalette(output = "swatch", n = length(unique(obj@metadata$louvain_cluster)))
If there is one you like, it is helpful to store the vector as a variable like “pal” for future use. Below we are illustrating how to store the pallete and recall it for use with dimFeature.
library(ggrepel)
group_labels <- merge(obj@metadata, obj@reductions[["umap_irlba_ch100k_cg100k"]], by = 0) |> dplyr::group_by(louvain_cluster) |> dplyr::summarise(dim_x = mean(dim_x), dim_y = mean(dim_y))
pal <- makePalette(option = 13, n = 5)
dimFeature(obj, colorBy = louvain_cluster, colors = pal, pointSize = .8, reduction = "umap_irlba_ch100k_cg100k") +
geom_text_repel(data = group_labels, aes(x = dim_x, y = dim_y, label = louvain_cluster))
In addition to cluster distribution, dimFeature is useful for visualizing how different parameters in the cellInfo file are distributed throughout the UMAP. The plots below are colored by coverage and global %mCH. From global %mCH alone we can tell right away that clusters 1 and 5 are neurons (neurons are known to have high non-CG methylation). In the groups that have lower %mCH levels, we can hypothesize that 2 are microglia, 3 are oligodendrocytes, and 4 are astrocytes. These predictions comes from previous research on non-CG methylation levels in different neural populations.
p1 <- dimFeature(obj, reduction = "umap_irlba_ch100k_cg100k", colorBy = log(cov), pointSize = .8) +
scale_color_gradientn(colors = c("black", "turquoise", "gold", "red")) + ggtitle("Coverage distribution")
p2 <- dimFeature(obj, reduction = "umap_irlba_ch100k_cg100k", colorBy = log(mch_pct), pointSize = .8) +
scale_color_gradientn(colors = c("black", "turquoise", "gold", "red")) + ggtitle("Global %mCH distribution")
plot_grid(p1, p2)
Now that we have clusters, the next step is determining their biological identity. This can be quite challenging for a novel modality like methylation. In addition to leveraging global %mCH, there are a couple ways we recommend going about this:
If you have an in-depth knowledge of the system, one useful method is to look at mCG hypomethylation over canonical marker genes. The first step is to load an annotation file for the reference genome so Amethyst knows the coordinates for each gene.
obj@ref <- makeRef("hg38")
Next, calculate methylation levels in short genomic windows for each cluster.
Deeper explanation: While calcSmoothedWindows sounds similar to makeWindows, they serve very different purposes. makeWindows returns a window (rows) x cell (columns) data frame for aggregated methylation values over very large regions. Here, we want to see methylation levels at a very fine resolution, which requires calculation over short regions. It is not feasible to make a window x cell data frame this big because there would be millions of windows per cell, threatening R’s internal vector limits. So, we pseudobulk by a metadata variable like cluster. We also smooth by averaging with both adjacent windows to try and alleviate some of the sparsity challenges that occur with methylation data. The output is a list of two data tables: “pct_matrix” with the pseudobulked, smoothed, %m over short windows; and “sum_matrix” with the sum of c (methylated) and t (unmethylated) counts use to produce “pct_matrix”. Though we won’t use the “sum_matrix” immediately, keep it, as we will need it for differentially methylated region testing.
We recommend 500bp windows for real data, but 1000 are
used here since the vignette dataset is only 50
cells.
Note: To avoid the hassle of recomputing these every time clusters
are adjusted, one can also calculate sliding smoothed windows with Facet
for each cell and store in the h5 file, then re-calculate/load more
rapidly per group with loadSmoothedWindows. See ?loadSmoothedWindows for
parameter specifications.
cluster1kbwindows_cg <- calcSmoothedWindows(obj,
type = "CG",
threads = 1,
step = 1000, # change to 500 for real data
smooth = 3,
genome = "hg38",
index = "chr_cg",
groupBy = "louvain_cluster",
returnSumMatrix = TRUE, # save sum matrix for DMR analysis
returnPctMatrix = TRUE)
Add the percent matrix output to the tracks slot. This will be used for visualizing methylation patterns over genomic regions with histograM or heatMap functions.
obj@tracks[["cg_cluster_tracks"]] <- cluster1kbwindows_cg[["pct_matrix"]]
Now you can view methylation patterns over key marker genes with the heatMap function. This is a very versatile function and one of the best ways to view methylation tracks in Amethyst, so we recommend becoming acquainted with the parameters. Shown below are mean smoothed methylation levels per short genomic window with hypomethylated regions in blue and hypermethylated regions in light grey. Rows are pseudobulked values for each distinct population in the groupBy parameter. Below each plot, the gene body is shown, with orientation indicated by the arrow, exons in black rectangles, and the putative promoter (start +/- 1500 bp) in pink.
Note: By default, the arrow hangs over the gene by 3k bp so you can see it after the last exon. This can be adjusted with the arrowOverhang parameter.
heatMap(obj,
genes = c("SLC17A7", # excitatory glutamatergic neurons
"DLX1", # inhibitory gabaergic neurons
"GAD1", # inhibitory gabaergic neurons
"C1QA", # microglia
"MOG", # oligodendrocytes
"SLC1A3" # astrocytes
),
track = "cg_cluster_tracks",
nrow = 3,
legend = T)
The heatMap results supports our hypotheses of cluster identity based on global %mCH: group 1 and 5 are neurons and 2-4 are glia. mCG patterns over canonical marker genes provide further resolution as well: 1 are likely excitatory neurons (SLC17A7+), 2 are microglia (C1QA+), 3 are oligodendrocytes (MOG+), 4 are astrocytes (SLC1A3), and 5 are inhibitory neurons (DLX1+, GAD1+).
Note: Not all genes will have distinct hypomethylation patterns over the expressing group. Many will have similar hypomethylation patterns despite established variation in transcription. Interpret with caution.
If you want to simultaneously view other information - like regulatory elements - these can be plotted as “tracks” below. In this example, we will use Encode candidate cis-regulatory elements (cCREs). The track structure should be a named list of data tables with chr, start, and end columns. First, download and split tracks into the expected structure.
# note this is for hg38 build only
library(rtracklayer)
download.file("https://hgdownload.soe.ucsc.edu/gbdb/hg38/encode3/ccre/encodeCcreCombined.bb", "~/Downloads/encodeCcreCombined.bb")
ccre <- as.data.table(rtracklayer::import("~/Downloads/encodeCcreCombined.bb"))
setnames(ccre, "seqnames", "chr")
ccre_tracks <- split(ccre, ccre$ucscLabel)
ccre_tracks <- lapply(ccre_tracks, function(x) {x[,.(chr, start, end)]})
Now plot tracks under the heatMaps by providing the list to the extraTracks parameter of heatMap:
heatMap(obj,
genes = "SLC1A3",
track = "cg_cluster_tracks",
legend = T,
extraTracks = ccre_tracks)
As you can see from the heatMaps, promoters are sometimes universally hypomethylated, or transposed from the predicted site. This helps illustrate the complexity of methylation patterns over gene bodies. Because of this, we prefer these tools to visualize over the whole gene body and surrounding context as opposed to aggregated metrics.
In addition to heatMap, you can also view methylation levels with the histograM function. This function has a similar effect, but methylation percentages are indicated as bar heights instead of colored tiles. Sometimes this is more helpful for illustrating patterns. Please see function documentation (with ?histograM) for a full list of which parameters can be changed.
histograM(obj,
baseline = "mean", # can plot from either mean or 0; see documentation
orientation = "rows",
genes = "SATB2",
track = "cg_cluster_tracks",
legend = T)
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
The brain uniquely has extremely high levels of mCH in addition to mCG, which can provide further information for cell type annotation. For example, glia have very little mCH, allowing for rapid distinction of neuronal vs glial groups. In addition, mCH has been shown to accumulate most variably across canonical marker genes distinguishing closely related subtypes. Gene body mCH accumulation is anticorrelated with gene expression.
To highlight differences between CG and CH, we will first look at broad levels over the whole genome at once with heatMapGenome. This function was constructed specifically for CH because levels uniquely fluctuate across megabase-scale regions thought to be guided by topologically-associated domain boundaries.
As a reference, we will first use heatMapGenome to look at 100kb levels in the CG context. In this plot, data for each chromosome is shown in a separate row with the x value corresponding to position. Within those rows, mean score value per 100kb window for each cluster is shown, where blue is hypomethylated.
# First, aggregate CG levels by cluster
obj@genomeMatrices[["cg_100k_score_cluster"]] <- aggregateMatrix(obj = obj,
matrix = "cg_100k_score",
groupBy = "louvain_cluster",
minCells = 2, # raise for real data
minValues = 2) # raise for real data
# Then plot
heatMapGenome(obj, matrix = "cg_100k_score_cluster", colors = c("#005eff", "#9daec9", "#cccccc", "#dbdbdb"))
Notice how there is subtle variation, but it relatively stable, particularly across groups. Next, we will plot the CH context.
# First, aggregate CH levels by cluster
obj@genomeMatrices[["ch_100k_pct_cluster"]] <- aggregateMatrix(obj = obj,
matrix = "ch_100k_pct",
groupBy = "louvain_cluster",
minCells = 2, # raise for real data
minValues = 2) # raise for real data
# Then plot
heatMapGenome(obj, matrix = "ch_100k_pct_cluster", colors = c("black", "red", "yellow"))
Notice how: 1) groups 2-4 have very low global %mCH. These are likely glia (which we already determined). 2) There are megabase-scale fluctuations in CH levels, some of which are highly disparate between groups. It’s more apparent in a larger dataset. Let’s zoom in with heatMap to see an example. First, we must make cluster tracks for mCH, which was already done for you.
# don't run - already run and in workspace
# cluster1kbwindows_ch <- calcSmoothedWindows(obj,
# type = "CH",
# threads = 0,
# step = 1000, # change to 500 for real data unless you have really low coverage
# smooth = 3,
# genome = "hg38",
# index = "chr_ch",
# groupBy = "louvain_cluster",
# returnSumMatrix = TRUE, # save sum matrix for DMR analysis
# returnPctMatrix = TRUE)
Define the percent matrix component of the result to the tracks slot.
obj@tracks[["ch_cluster_tracks"]] <- cluster1kbwindows_ch[["pct_matrix"]]
Now use the heatMap function to zoom in on chr2:170500000-172800000 by defining the regions parameter. Notice the disparate mCH levels between groups 1 and 5 over DLX genes, GAD1, and RAPGEF4-AS1.
heatMap(obj,
regions = "chr2_170500000_172800000",
trackOverhang = 0,
remove = "^ENS|^MT", # optional; uses grep to remove genes that may be outside the question of interest and cluttering up the view
track = "ch_cluster_tracks")
Hopefully this helps illustrate how mCH fluctuates both over large-scale domains and gene boundaries.
We will next calculate %mCH over protein coding genes. First, extract the protein coding gene names from the reference annotation file. Next, use makeWindows to calculate %mCH over these genes. Again, this has been done for you in the interest of time.
# don't run - already run and in workspace
# protein_coding <- unique(obj@ref |> dplyr::filter(type == "gene" & gene_type == "protein_coding" & seqid != "chrM") |> dplyr::pull(gene_name))
# obj@genomeMatrices[["gene_ch"]] <- makeWindows(obj, genes = protein_coding, type = "CH", metric = "percent", threads = 1, index = "chr_ch", nmin = 2)
We can then plot the mCH levels over gene bodies with functions like dotM. This plotting tool accepts two modalities - one will determine dot size and the other color. Because dotM accepts two matrices for aesthetics, we will first calculate a normalized %mCH level based on average gene values using the sweep function.
obj@genomeMatrices[["gene_ch_norm"]] <- sweep(obj@genomeMatrices[["gene_ch"]], 2, colMeans(obj@genomeMatrices[["gene_ch"]], na.rm = T), FUN="/")
dotM(obj, genes = c("SATB2", "LINGO1", "SV2B", "CTTNBP2", "BROX", "GAD1", "DNER", "SLC6A1"),
groupBy = "louvain_cluster", sizeMatrix = "gene_ch", colorMatrix = "gene_ch_norm") + scale_size(range = c(1, 10))
Based on the variable levels of SATB2 (highly expressed in excitatory neurons) and GAD1 (highly expressed in inhibitory neurons), we can clearly tell from this dot plot that cluster 5 is inhibitory and 1 is excitatory neurons. Our manuscript also explores rare hyper-mCH genes in glia, such as CTTNBP2 in astrocytes (cluster 4) and BROX in oligodendrocytes (cluster 3). That leaves cluster 2 as likely microglia since they have hardly any mCH.
Let’s explore a couple other methods to view gene body %mCH, like dimM or violinM.
dimM plots the value of a gene in a given input matrix over the corresponding cell’s UMAP/TSNE coordinates. This is particularly helpful when looking at %mCH over genes in the brain, which can be used to distinguish subtype identity.
dimM(obj,
genes = c("SATB2", "GAD1"),
reduction = "umap_irlba_ch100k_cg100k", # name of dim reduction to plot
matrix = "gene_ch", # matrix containing the values of interest
pointSize = .8,
squish = 7, # upper bound of color scale; helpful for noisy data
colors = c("#dbdbdb", "#cccccc", "#265A61", "#0C1E20"))
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
violinM uses the same values as input, but plots the results in a violin plot grouped by a parameter in the metadata.
violinM(obj, genes = c("SATB2", "GAD1"), matrix = "gene_ch", groupBy = "louvain_cluster", colors = pal)
It can also be helpful to use a less directed approach when determining differences between groups. There are functions to help with this like findClusterMarkers, which takes an input matrix and metadata grouping variable, then tests if any features are differentially methylated in that grouping variable relative to others. Below is an example findClusterMarkers call. This will be especially useful for finding cluster-specific %mCH genes. For the sake of time, this has already been run for you and is in the workspace.
# don't run - already in workspace
# cluster_markers <- findClusterMarkers(obj,
# matrix = "gene_ch",
# groupBy = "louvain_cluster",
# method = "bonferroni",
# genes = rownames(obj@genomeMatrices[["gene_ch"]]),
# nmin = 5, # increase to at least 10 for larger datasets
# threads = 1)
# cluster_markers <- cluster_markers |> dplyr::filter(p.adj < 0.05)
To select top results, I like to use a combined rank metric of lowest adjusted p value and highest absolute logFC. This helps ameliorate artifacts generated by either approach.
top_markers_mCH <- cluster_markers |>
dplyr::group_by(cluster_id, direction) |>
dplyr::arrange(p.adj, .by_group = TRUE) |> dplyr::mutate(rank_padj = 1:n()) |>
dplyr::arrange(desc(abs(logFC)), .by_group = TRUE) |> dplyr::mutate(rank_logFC = 1:n()) |>
rowwise() |> dplyr::mutate(total_rank = sum(rank_padj, rank_logFC)) |>
group_by(cluster_id, direction) |> slice_min(n = 5, order_by = total_rank)
Now plot some top results with dotM, using the mean %mCH over gene bodies as the size aesthetic, and the color as the gene %mCH normalized to the global %mCH. This further supports our putative annotations. For example, QKI - which is known to be hyper-mCH in excitatory neurons and hypo-mCH in inhibitory - is hypermethylated in group 1 and hypomethylated in group 5.
dotM(obj, genes = top_markers_mCH$gene, sizeMatrix = "gene_ch", colorMatrix = "gene_ch_norm", groupBy = "louvain_cluster")
Exploration of mCH over key marker genes is a powerful method for annotating brain data. It is the most neuronal subtype-specific epigenetic modality known.
Another method for cell type annotation is by comparison to an annotated reference. While few exist, there are high-quality references available for brain data. We have aggregated mCH across 5,972 key genes from PMC9004682. This reference can be downloaded from Github. First, download this data, then calculate average methylation levels per cluster for each gene with aggregateMatrix. The result will be a window (rows) x groups (columns) matrix with mean values calculated for each feature by the grouping variable.
download.file("https://raw.githubusercontent.com/lrylaarsdam/amethyst/main/vignettes/brain_vignette/PMC9004682_5972genes_BA10_ref.RData", "~/Downloads/PMC9004682_5972genes_BA10_ref.RData", method = "curl")
ref <- readRDS("~/Downloads/PMC9004682_5972genes_BA10_ref.RData")
obj@genomeMatrices[["gene_ch_cluster"]] <- aggregateMatrix(obj,
matrix = "gene_ch",
groupBy = "louvain_cluster",
minCells = 2, # raise for real data
minValues = 2) # raise for real data)
Now see how aggregated mCH profiles from the reference compare to those of our clusters. First, merge together common windows from the pre-annotated data and our clusters. Then use the cor function to calculate Pearson’s r for pairwise complete observations. Finally, use pheatmap to plot comparisons across datasets.
cor <- cor(merge(ref,
obj@genomeMatrices[["gene_ch_cluster"]],
by = 0) |> tibble::column_to_rownames(var = "Row.names"), use = "pairwise.complete.obs")
cor <- cor[c(1:ncol(ref)), c((ncol(ref) + 1)):ncol(cor)]
pheatmap(cor)
As previously suspected, cluster 1 is a mixed population of excitatory neurons, and cluster 5 inhibitory neurons. The other 3 are glia, which we also determined from global %mCH levels and %mCG over canonical marker genes. Based on all these annotation tools, we can rename our clusters according to broad class using dplyr logic.
obj@metadata[["type"]] <- dplyr::recode(obj@metadata[["louvain_cluster"]],
"1" = "Exc",
"2" = "Micro",
"3" = "Oligo",
"4" = "Astro",
"5" = "Inh")
group_labels <- merge(obj@metadata, obj@reductions[["umap_irlba_ch100k_cg100k"]], by = 0) |> dplyr::group_by(type) |> dplyr::summarise(dim_x = mean(dim_x), dim_y = mean(dim_y))
dimFeature(obj, colorBy = type, colors = pal, reduction = "umap_irlba_ch100k_cg100k", pointSize = .8) +
geom_text_repel(data = group_labels, aes(x = dim_x, y = dim_y, label = type))
You might also want to rename cluster tracks with the group annotations. Recalculate, or just copy the previous one and name like so:
obj@tracks[["cg_type_tracks"]] <- copy(obj@tracks[["cg_cluster_tracks"]])
setnames(obj@tracks[["cg_type_tracks"]], c("chr", "start", "end", "Astro", "Exc", "Micro", "Oligo", "Inh"))
obj@tracks[["ch_type_tracks"]] <- copy(obj@tracks[["ch_cluster_tracks"]])
setnames(obj@tracks[["ch_type_tracks"]], c("chr", "start", "end", "Astro", "Exc", "Micro", "Oligo", "Inh"))
Now, when we call the renamed track, heatMaps will be labeled with the corresponding population:
heatMap(obj,
genes = c("SLC17A7", # excitatory glutamatergic neurons
"DLX1", # inhibitory gabaergic neurons
"GAD1", # inhibitory gabaergic neurons
"C1QA", # microglia
"MOG", # oligodendrocytes
"SLC1A3" # astrocytes
),
track = "cg_type_tracks",
nrow = 3,
legend = F)
At the core of single-cell methylation analysis is the identification of differentially methylated regions (DMRs). There are two main formats to set up DMR analysis. The first is to test DMRs for each cluster/type against all others. Only the sum matrix (which we saved at the calcSmoothedWindows step) is needed for the testDMR function. Alternatively, you could regenerate the calcSmoothedWindows output with your annotated cell types. Here we are using the CG data, but this can directly be applied to CH as well.
Note: A bug was recently found in testDMR. We recommend re-running testDMR for any analysis done with versions before v1.0.2. We sincerely apologize for the inconvenience.
dmrs_cg <- testDMR(cluster1kbwindows_cg[["sum_matrix"]], # Sum of c and t observations in each genomic window per group
eachVsAll = TRUE, # If TRUE, each group found in the sumMatrix will be tested against all others
nminTotal = 5, # Min number observations across all groups to include the region in calculations
nminGroup = 5) # Min number observations across either members or nonmembers to include the region in calculations
## Finished group 4
## Finished group 1
## Finished group 2
## Finished group 3
## Finished group 5
The output is the result of all tests. For each window and group, “c” is the total count of methylated observations, and “t” is the total count of unmethylated observations. A column for each test has been added with the resulting logFC and p value. However, this is not in a very user-friendly format. The output of testDMR is not really meant to be used as-is - it’s just an intermediate so one does not have to re-compute all tests if an alteration in stringency parameters is needed.
Next, expand and filter the resulting list according to the desired stringency.
dmrs_cg <- filterDMR(dmrs_cg,
method = "bonferroni", # c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr")
filter = TRUE, # If TRUE, removes insignificant results
pThreshold = 0.01, # Maxmimum adjusted p value to allow if filter = TRUE
logThreshold = 1.5) # Minimum absolute value of the log2FC to allow if filter = TRUE
head(dmrs_cg)
## chr start end pval logFC padj direction test
## <char> <num> <num> <num> <num> <num> <char> <char>
## 1: chr1 1351000 1352000 1.348357e-12 -1.9606 1.858562e-05 hypo 4_all
## 2: chr1 1472000 1473000 6.826521e-14 -1.7574 9.409610e-07 hypo 4_all
## 3: chr1 2527000 2528000 7.827937e-12 -2.1370 1.078995e-04 hypo 4_all
## 4: chr1 2528000 2529000 6.432860e-14 -1.9806 8.866992e-07 hypo 4_all
## 5: chr1 2530000 2531000 2.383553e-13 -1.9886 3.285467e-06 hypo 4_all
## 6: chr1 2531000 2532000 1.673948e-18 -2.7884 2.307353e-11 hypo 4_all
Often, adjacent genomic windows are differentially methylated. It can be helpful to collapse them into one locus with collapseDMR. As a helpful bonus, if annotation = T, any overlapping genes will be noted in the output results table.
Note: Please pay attention to the maxDist and minLength parameters and decide if these are appropriate for your data. Here, we are combining any DMRs within 2kb of each other, and applying a minimum threshold of 2kb to count as a DMR.
collapsed_dmrs_cg <- collapseDMR(obj,
dmrs_cg,
maxDist = 4000, # Max allowable overlap between DMRs to be considered adjacent
minLength = 2000, # Min length of collapsed DMR window to include in the output
reduce = T, # Reduce results to unique observations (recommended)
annotate = T) # Add column with overlapping gene names
head(collapsed_dmrs_cg)
## Key: <chr, dmr_start, dmr_end>
## chr dmr_start dmr_end test direction dmr_length dmr_padj dmr_logFC
## <char> <num> <num> <char> <char> <num> <num> <num>
## 1: chr1 906000 908000 2_all hypo 2000 2.046375e-07 -1.603200
## 2: chr1 925000 927000 1_all hyper 2000 6.959487e-08 1.580600
## 3: chr1 997000 1000000 2_all hypo 3000 3.489881e-09 -2.301933
## 4: chr1 1304000 1307000 3_all hypo 3000 1.844188e-09 -1.967933
## 5: chr1 1305000 1308000 2_all hyper 3000 6.211256e-10 2.200767
## 6: chr1 1305000 1309000 1_all hypo 4000 1.498810e-04 -2.036967
## gene_names
## <char>
## 1: ENSG00000272438
## 2: SAMD11
## 3: ENSG00000272512, HES4
## 4: ACAP3
## 5: ACAP3
## 6: ACAP3, PUSL1
In the output, the “test” column indicates which comparison the DMR was identified in. For convenience, we will add an annotation column “type” to indicate which population the DMR corresponds to.
collapsed_dmrs_cg$type <- dplyr::recode(collapsed_dmrs_cg$test,
"1_all" = "Exc",
"2_all" = "Micro",
"3_all" = "Oligo",
"4_all" = "Astro",
"5_all" = "Inh")
Often, specific comparisons are desired, like testing between control and treatment groups of a specific cell type. A data frame can be provided describing the tests. Three columns should be included: One listing members of group A, one listing members of group B, and one with the name of the test. As an example, below is a comparison table structure for testing CH DMRs in excitatory vs. inhibitory neurons.
Note: The “name”, “A”, and “B” columns are fixed; do not change or else testDMR won’t be able to extract the appropriate comparisons. Please also note that conditions for each test are concatenated with a single comma (no spaces in between) and wrapped in double quotes.
# not necessary to run for the completion of this vignette
comparisons <- data.frame(
stringsAsFactors = FALSE,
name = c("exc_vs_inh"),
A = c("1"),
B = c("5")
)
dmrs_ch <- testDMR(sumMatrix = cluster1kbwindows_ch[["sum_matrix"]], comparisons = comparisons, nminTotal = 5, nminGroup = 5)
##
## Finished testing exc_vs_inh: 1 vs. 5
dmrs_ch <- filterDMR(dmrs_ch, method = "bonferroni", filter = TRUE, pThreshold = 0.01, logThreshold = 2)
collapsed_dmrs_ch <- collapseDMR(obj, dmrs_ch, maxDist = 4000, minLength = 2000, reduce = T, annotate = T)
After DMR testing, you should have a table of hypo and hypermethylated regions for each cluster. First, let’s look at how many DMRs were identified in each group with a simple ggplot command. We will plot a bar chart showing the number of DMRs identfied for each condition.
ggplot(collapsed_dmrs_cg |> dplyr::group_by(type, direction) |> dplyr::summarise(n = n()),
aes(y = type, x = n, fill = type)) + geom_col() +
facet_grid(vars(direction), scales = "free_y") + scale_fill_manual(values = pal) + theme_classic()
## `summarise()` has grouped output by 'type'. You can override using the
## `.groups` argument.
It can be helpful to isolate top results per condition. I find it helpful to select by a combined metric of logFC and padj, because selecting by either metric alone tends to yield unwanted noise.
top_dmrs_cg <- collapsed_dmrs_cg |>
dplyr::group_by(type, direction) |>
dplyr::arrange(dmr_padj, .by_group = TRUE) |> dplyr::mutate(rank_padj = 1:n()) |>
dplyr::arrange(desc(abs(dmr_logFC)), .by_group = TRUE) |> dplyr::mutate(rank_logFC = 1:n()) |>
rowwise() |> dplyr::mutate(total_rank = sum(rank_padj, rank_logFC)) |>
group_by(type, direction) |> slice_min(n = 1, order_by = total_rank) |>
dplyr::mutate(location = paste0(chr, "_", (dmr_start - 2000), "_", (dmr_end + 2000))) |> dplyr::arrange(direction)
top_dmrs_ch <- collapsed_dmrs_ch |>
dplyr::group_by(direction) |>
dplyr::arrange(dmr_padj, .by_group = TRUE) |> dplyr::mutate(rank_padj = 1:n()) |>
dplyr::arrange(desc(abs(dmr_logFC)), .by_group = TRUE) |> dplyr::mutate(rank_logFC = 1:n()) |>
rowwise() |> dplyr::mutate(total_rank = sum(rank_padj, rank_logFC)) |>
group_by(direction) |> slice_min(n = 3, order_by = total_rank) |>
dplyr::mutate(location = paste0(chr, "_", (dmr_start - 2000), "_", (dmr_end + 2000))) |> dplyr::arrange(direction)
Now use heatMap to plot top CG DMRs for each group. Plot the DMRs as tracks underneath. Some fall over canonical marker genes, supporting the biological relevance of results. For example, The top inhibitory neuron hypomethylated region is over the NRXN3 gene, which plays critical roles in inhibitory synaptic specification and transmission.
dmr_cg_hypo_tracks <- split(collapsed_dmrs_cg |> dplyr::filter(direction == "hypo") |> ungroup() |> dplyr::select(chr, dmr_start, dmr_end, type) |> dplyr::rename(start = dmr_start, end = dmr_end) |> data.table(), collapsed_dmrs_cg[collapsed_dmrs_cg$direction == "hypo"]$type)
heatMap(obj,
track = "cg_type_tracks",
regions = top_dmrs_cg$location[top_dmrs_cg$direction == "hypo"],
nrow = 3,
arrowOverhang = 0,
legend = F,
extraTracks = dmr_cg_hypo_tracks,
extraTrackColors = pal)
Now use heatMap to plot top CH DMRs for each group. Note that a top hyper-mCH region for excitatory neurons is over DLX6, which is involved in inhibitory specification, supporting the hypothesis that mCH represses key genes distinguishing neuronal subtypes.
dmr_ch_tracks <- split(collapsed_dmrs_ch |> dplyr::select(chr, dmr_start, dmr_end, test) |> dplyr::rename(start = dmr_start, end = dmr_end) |> data.table(), collapsed_dmrs_ch$test)
heatMap(obj,
track = "ch_type_tracks",
regions = top_dmrs_ch$location,
colorMax = 15,
nrow = 3,
arrowOverhang = 0,
extraTracks = dmr_ch_tracks)
Further interpretation of the results can be explored using a wide variety of packages available on R. n this example, we will use the topGO package to test for Gene Ontology (GO) term enrichments for genes with hypomethylated regions in the excitatory neuron group. First, extract the background list (all genes), then extract the query list from the DMR table.
Note: topGO is not a dependency of Amethyst. You may have to install this package separately.
# install if needed
BiocManager::install("topGO")
## Warning: package(s) not installed when version(s) same as or greater than current; use
## `force = TRUE` to re-install: 'topGO'
BiocManager::install("org.Hs.eg.db")
## Warning: package(s) not installed when version(s) same as or greater than current; use
## `force = TRUE` to re-install: 'org.Hs.eg.db'
# load library
library(topGO)
## Warning: package 'AnnotationDbi' was built under R version 4.3.1
# extract background and query gene lists
background <- unique(obj@ref |> dplyr::filter(type == "gene" & gene_type == "protein_coding" & seqid != "chrM") |> dplyr::pull(gene_name))
query <- unlist(strsplit(collapsed_dmrs_cg$gene_names[collapsed_dmrs_cg$type == "Exc" & collapsed_dmrs_cg$direction == "hypo"], ", "))
# run topGO analysis
GOdata <- new("topGOdata",
description = "GO Enrichment Analysis",
ontology = "BP",
allGenes = setNames(factor(as.integer(background %in% query), levels = c(0, 1)), background),
geneSel = function(x) x == 1,
nodeSize = 10,
annot = annFUN.org,
mapping = "org.Hs.eg.db",
ID = "symbol")
resultElim <- runTest(GOdata, algorithm = "elim", statistic = "fisher")
resultElim <- GenTable(GOdata, Fisher = resultElim, topNodes = 500, numChar = 60)
# filter results
resultElim <- resultElim |> dplyr::filter(Fisher < 0.01 & Significant > 7) |> dplyr::mutate(fold_change = Significant/Expected, Fisher = as.numeric(Fisher)) |> dplyr::filter(fold_change > 2)
resultElim <- janitor::clean_names(resultElim)
Use ggplot to note top results. Here we will just plot GO terms according to fold change (bar width) and adjusted p value (color) in a bar chart. As expected, top results are strongly related to excitatory synaptic transmission.
ggplot(resultElim, aes(x = fold_change, y = reorder(term, fold_change), fill = fisher)) + geom_col() + theme_classic() + scale_fill_viridis_c(direction = -1)
GO analysis can often produce hundreds of results. Selecting top candidates by either fold change or p value both have caveats. Cherry picking is misleading. To view results more holistically, packages like rrvgo can be helpful. We will first find parent terms and plot the proportion of results devoted to each with a treemapPlot. Notice how this helps illustrate that the vast majority of results are related to a couple terms with strong involvement in neuronal function.
Note: rrvgo is not a dependency of Amethyst. You may have to install this package separately.
# install if needed
BiocManager::install("rrvgo")
suppressMessages(library(rrvgo))
## Warning: package 'rrvgo' was built under R version 4.3.1
simMatrix <- calculateSimMatrix(resultElim$go_id, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
## preparing gene to GO mapping data...
## preparing IC data...
scores <- setNames(-log10(resultElim$fisher), resultElim$go_id)
reducedTerms <- reduceSimMatrix(simMatrix,scores, threshold=0.9, orgdb="org.Hs.eg.db")
treemapPlot(reducedTerms)
Finally, save your workspace so you don’t have to re-calculate everything in the future.
save.image("~/Downloads/brain_vignette_workspace.RData")
Thanks for trying out Amethyst! Additional utilities are still to come. We are also open to suggestions. If you use this package or code on our Github for your work, please cite our manuscript.
Good luck in your analysis!